This tutorial is still under development and may change slightly as it is being edited.
This tutorial is relevant to the published paper RNA exploits an exposed regulatory site to inhibit the enzymatic activity of PRC2. However, the workflow can be used for data processed by MaxQuant. The data for this tutorial is available here.The relevant files are located within the .zip files for each repeat as MaxQuant_txt_folder_output/peptides.txt. For simplicity, all “peptides.txt” files have been re-named according to their experiment numbers as part of the external data available as part of the R package.
In this tutorial, we w
First, load the libraries needed for the package and tutorial.
library(ggplot2)
library(bio3d)
library(Biostrings)
library(seqinr)
library(RColorBrewer)
library(openxlsx)
library(viridis)
library(stringr)
library(svglite)
library(crisscrosslinker)
When loading the data, it is important to make sure that the files for input are named well so the functions will be able to correctly identify and organize the data.
sys.path <- system.file("extdata/NSMB_RBDmap",package = 'crisscrosslinker',mustWork = TRUE)
sys.files <- list.files(sys.path)
sys.files
## [1] "Repeat01_NoXL_ArgC_eluate.txt"
## [2] "Repeat01_NoXL_ArgC_input.txt"
## [3] "Repeat01_NoXL_LysC_eluate.txt"
## [4] "Repeat01_NoXL_LysC_input.txt"
## [5] "Repeat01_UV_ArgC_eluate.txt"
## [6] "Repeat01_UV_ArgC_input.txt"
## [7] "Repeat01_UV_LysC_eluate.txt"
## [8] "Repeat01_UV_LysC_input.txt"
## [9] "Repeat02_NoXL_ArgC_eluate_MaxQuant_txt.txt"
## [10] "Repeat02_NoXL_ArgC_input_MaxQuant_txt.txt"
## [11] "Repeat02_NoXL_LysC_eluate_MaxQuant_txt.txt"
## [12] "Repeat02_NoXL_LysC_input_MaxQuant_txt.txt"
## [13] "Repeat02_UV_ArgC_eluate_MaxQuant_txt.txt"
## [14] "Repeat02_UV_ArgC_input_MaxQuant_txt.txt"
## [15] "Repeat02_UV_LysC_eluate_MaxQuant_txt.txt"
## [16] "Repeat02_UV_LysC_input_MaxQuant_txt.txt"
## [17] "Repeat03_NoXL_ArgC_eluate_MaxQuant_txt.txt"
## [18] "Repeat03_NoXL_ArgC_input_MaxQuant_txt.txt"
## [19] "Repeat03_NoXL_LysC_eluate_MaxQuant_txt.txt"
## [20] "Repeat03_NoXL_LysC_input_MaxQuant_txt.txt"
## [21] "Repeat03_UV_ArgC_eluate_MaxQuant_txt.txt"
## [22] "Repeat03_UV_ArgC_input_MaxQuant_txt.txt"
## [23] "Repeat03_UV_LysC_eluate_MaxQuant_txt.txt"
## [24] "Repeat03_UV_LysC_input_MaxQuant_txt.txt"
## [25] "Repeat04_NoXL_ArgC_eluate_MaxQuant_txt.txt"
## [26] "Repeat04_NoXL_ArgC_input_MaxQuant_txt.txt"
## [27] "Repeat04_NoXL_LysC_eluate_MaxQuant_txt.txt"
## [28] "Repeat04_NoXL_LysC_input_MaxQuant_txt.txt"
## [29] "Repeat04_UV_ArgC_eluate_MaxQuant_txt.txt"
## [30] "Repeat04_UV_ArgC_input_MaxQuant_txt.txt"
## [31] "Repeat04_UV_LysC_eluate_MaxQuant_txt.txt"
## [32] "Repeat04_UV_LysC_input_MaxQuant_txt.txt"
Each of the file names contains 3 pieces of information: the name of the experiment, if it is an input/eluate file, and which protease was used for the experiment.
head(read.table(paste0(sys.path,'/',sys.files)[1], sep='\t',header = TRUE))
## Sequence N.term.cleavage.window
## 1 CLSYYQLDGSNTIQCIK REKDFYRSGEQVAFKCLSYYQLDGSNTIQC
## 2 DAEEWFFTKTEELNR EMRDQYEKMAEKNRKDAEEWFFTKTEELNR
## 3 DDTKCLASIAK NFCLFQSNSKDLLFRDDTKCLASIAKKTYD
## 4 DLDAECLHRTELETK SKRTDMEFTFVQLKKDLDAECLHRTELETK
## 5 FSPNVHGLIGQFMNEPK INVDFLGIYIPPTNKFSPNVHGLIGQFMNE
## 6 GGRQGSYHEQSVDRSGHSGSHHSHTTSQGR QEGQDTIRGHPGSRRGGRQGSYHEQSVDRS
## C.term.cleavage.window Amino.acid.before First.amino.acid
## 1 SYYQLDGSNTIQCIKSKWIGRPACRDVSCG K C
## 2 DAEEWFFTKTEELNREVATNTEALQSSRTE K D
## 3 LLFRDDTKCLASIAKKTYDSYLGDDYVRAM R D
## 4 DLDAECLHRTELETKLKSLESFVELMKTIY K D
## 5 PNVHGLIGQFMNEPKIHVFNERPGKDPEKP K F
## 6 GHSGSHHSHTTSQGRSDASHGQSGSRSASR R G
## Second.amino.acid Second.last.amino.acid Last.amino.acid
## 1 L I K
## 2 A N R
## 3 D A K
## 4 L T K
## 5 S P K
## 6 G G R
## Amino.acid.after A.Count R.Count N.Count D.Count C.Count Q.Count E.Count
## 1 S 0 0 1 1 2 2 0
## 2 E 1 1 1 1 0 0 4
## 3 K 2 0 0 2 1 0 0
## 4 L 1 1 0 2 1 0 3
## 5 I 0 0 2 0 0 1 1
## 6 S 0 3 0 1 0 3 1
## G.Count H.Count I.Count L.Count K.Count M.Count F.Count P.Count S.Count
## 1 1 0 2 2 1 0 0 0 2
## 2 0 0 0 1 1 0 2 0 0
## 3 0 0 1 1 2 0 0 0 1
## 4 0 1 0 3 1 0 0 0 0
## 5 2 1 1 1 1 1 2 2 1
## 6 6 5 0 0 0 0 0 0 7
## T.Count W.Count Y.Count V.Count U.Count O.Count Length Missed.cleavages
## 1 1 0 2 0 0 0 17 0
## 2 2 1 0 0 0 0 15 1
## 3 1 0 0 0 0 0 11 1
## 4 2 0 0 0 0 0 15 1
## 5 0 0 0 1 0 0 17 0
## 6 2 0 1 1 0 0 30 2
## Mass Proteins Leading.razor.protein Start.position
## 1 2061.950 CON__Q28085 CON__Q28085 1026
## 2 1913.880 CON__Q6IFX2;CON__P02533 CON__Q6IFX2 280
## 3 1220.607 CON__Q29443;CON__Q0IIK2 CON__Q29443 640
## 4 1828.862 CON__Q6KB66-1 CON__Q6KB66-1 191
## 5 1913.946 CON__Q9TRI1 CON__Q9TRI1 852
## 6 3215.434 CON__P20930 CON__P20930 3767
## End.position Unique..Groups. Unique..Proteins. Charges PEP
## 1 1042 yes yes 4 0.00961790
## 2 294 yes no 4 0.00023835
## 3 650 yes no 3 0.00810090
## 4 205 yes yes 4 0.00023835
## 5 868 yes yes 4 0.00961790
## 6 3796 yes yes 4 0.01511400
## Score Intensity Reverse Potential.contaminant id Protein.group.IDs
## 1 3.60420 623060000 NA + 0 14
## 2 4.10180 331750000 NA + 1 5
## 3 0.00000 270080 NA + 2 12
## 4 1.18580 10152000 NA + 3 20
## 5 0.90048 890910 NA + 4 24
## 6 1.75540 158400000 NA + 5 8
## Mod..peptide.IDs Evidence.IDs MS.MS.IDs Best.MS.MS
## 1 0 0 0 0
## 2 1 1 1 1
## 3 2 2 2 2
## 4 3 3 3 3
## 5 4 4 4 4
## 6 5 5 5 5
## Oxidation..M..site.IDs MS.MS.Count
## 1 NA 0
## 2 NA 0
## 3 NA 0
## 4 NA 0
## 5 NA 0
## 6 NA 1
If comparing between multiple experiments, indicate the experiment names in the code.
#experiment_names <- c('experiment1','experiment2')
experiment_names <- c('Repeat02_NoXL','Repeat02_UV')
We will also need to load the FASTA file that was used during the initial analysis to get the tryptic peptides. For more information on how this FASTA file was generated, please refer to the paper.
fasta_file <- seqinr::read.fasta(system.file("extdata/NSMB_FASTA",'PRC2_5m.fasta',package = 'crisscrosslinker',mustWork = TRUE))
If the file format is from MaxQuant with an Andromeda score, you will need to set it. The default setting is 20 and we will explicitly set it in this demonstration.
cutoff_score <- 20
We are now ready to get a list of the relevant hits for our data.
sequence_hit_list <- rbd.makeSeqHitList(fasta_file = fasta_file,
file_format = 'txt',
cutoff_score = cutoff_score,
experiment_directory = sys.path)
sequence_hit_list[[2]]
## $sequence
## [1] "ASMSEFLESEDGEVEQQR" "ATWETILDGK"
## [3] "DEVLSADYDLLGEK" "DVSCPIR"
## [5] "EAAFDDAVEER" "ECDPDLCLTCGAADHWDSK"
## [7] "ECSVTSDLDFPTQVIPLK" "ESLTTDLQTR"
## [9] "ESSIIAPAPAEDVDTPPR" "EVSTAPAGTDMPAAK"
## [11] "FDYSQCDIWYMRFSMDFWQKMLALGNQVGK" "FSMDFWQK"
## [13] "HPSKPDPSGECNPDLR" "IHFPDFSTR"
## [15] "IKPSESNVTILGR" "LMIWDTR"
## [17] "LQLLDGEYEVAMQEMEECPISK" "NFMLHLVSMHDFNLISIMSIDK"
## [19] "NSESLHQENKPGSVK" "NSESLHQENKPGSVKPTQTIAVK"
## [21] "PDPSGECNPDLR" "PSTPTINVLESK"
## [23] "SLSPGAASSSSGDGDGK" "STPAMMNGQGSTTSSSK"
## [25] "TEILNQEWK" "VINEEYK"
## [27] "VMMVNGDHR" "WLGDLILSK"
## [29] "YSQADALK"
##
## $intensity
## [1] 1418600000 1513200000 199910000 208170000 578880000 24039000
## [7] 90662000 1032800000 353530000 955100000 9762200 616470000
## [13] 235430000 540690000 269540000 248550000 3419100000 9372700
## [19] 25135000 168310000 3666600 372870000 27224000 33699000
## [25] 577730000 197920000 81929000 2322600000 217840000
##
## $protein
## [1] "SUZ12_Q15022" "SUZ12_Q15022" "EED_O75530" "SUZ12_Q15022"
## [5] "RBBP4_Q09028" "EZH2_Q15910-2" "EZH2_Q15910-2" "SUZ12_Q15022"
## [9] "EZH2_Q15910-2" "EED_O75530" "EED_O75530" "EED_O75530"
## [13] "RBBP4_Q09028" "EED_O75530" "EED_O75530" "RBBP4_Q09028"
## [17] "SUZ12_Q15022" "SUZ12_Q15022" "SUZ12_Q15022" "SUZ12_Q15022"
## [21] "RBBP4_Q09028" "EZH2_Q15910-2" "AEBP2_Q6ZN18" "AEBP2_Q6ZN18"
## [25] "EZH2_Q15910-2" "RBBP4_Q09028" "EZH2_Q15910-2" "EED_O75530"
## [29] "EZH2_Q15910-2"
##
## $amino_acid_before
## [1] "K" "R" "R" "K" "K" "R" "R" "K" "K" "R" "R" "R" "K" "K" "K" "K" "R"
## [18] "R" "R" "R" "K" "R" "R" "R" "R" "R" "K" "R" "R"
##
## $amino_acid_after
## [1] "R" "K" "K" "R" "R" "K" "K" "R" "R" "K" "K" "K" "R" "R" "R" "R" "K"
## [18] "K" "K" "K" "R" "K" "K" "K" "K" "K" "R" "K" "K"
Here we have found all of the tryptic peptides that:
Since we want to compare the input and eluate data from the experiments, let’s make a table comparing the two.
input_eluate_table <- rbd.makeIETable(sequence_hit_list, experiment_names)
We can see how the results look here:
head(input_eluate_table)
## sequence input
## Repeat02_NoXL_ArgC.ASMSEFLESEDGEVEQQR ASMSEFLESEDGEVEQQR 288150000
## Repeat02_NoXL_ArgC.ATWETILDGK ATWETILDGK 2490900000
## Repeat02_NoXL_ArgC.DEVLSADYDLLGEK DEVLSADYDLLGEK 309200000
## Repeat02_NoXL_ArgC.DPNLLLSVSK DPNLLLSVSK 49414000
## Repeat02_NoXL_ArgC.EAAFDDAVEER EAAFDDAVEER 1168100000
## Repeat02_NoXL_ArgC.ECDPDLCLTCGAADHWDSK ECDPDLCLTCGAADHWDSK 149440000
## eluate category
## Repeat02_NoXL_ArgC.ASMSEFLESEDGEVEQQR 0 Repeat02_NoXL_ArgC
## Repeat02_NoXL_ArgC.ATWETILDGK 0 Repeat02_NoXL_ArgC
## Repeat02_NoXL_ArgC.DEVLSADYDLLGEK 0 Repeat02_NoXL_ArgC
## Repeat02_NoXL_ArgC.DPNLLLSVSK 0 Repeat02_NoXL_ArgC
## Repeat02_NoXL_ArgC.EAAFDDAVEER 0 Repeat02_NoXL_ArgC
## Repeat02_NoXL_ArgC.ECDPDLCLTCGAADHWDSK 0 Repeat02_NoXL_ArgC
## protein aa_before aa_after
## Repeat02_NoXL_ArgC.ASMSEFLESEDGEVEQQR SUZ12_Q15022 K R
## Repeat02_NoXL_ArgC.ATWETILDGK SUZ12_Q15022 R K
## Repeat02_NoXL_ArgC.DEVLSADYDLLGEK EED_O75530 R K
## Repeat02_NoXL_ArgC.DPNLLLSVSK EED_O75530 R K
## Repeat02_NoXL_ArgC.EAAFDDAVEER RBBP4_Q09028 K R
## Repeat02_NoXL_ArgC.ECDPDLCLTCGAADHWDSK EZH2_Q15910-2 R K
We can now move on to making a plot.
As you can see, many of the hits that were found in the input were not found in the eluate. We will filter those later. For now, let’s plot everything.
input_eluate_graph <- rbd.makeIEPlot(input_eluate_table, experiment_names)
print(input_eluate_graph)
We can change the default settings of the plot, including colors and the title by inputting some more variables:
name_of_experiment <- 'RBDmap Experiments: Input vs. Eluate'
input_eluate_graph <- rbd.makeIEPlot(input_eluate_table,
experiment_names,
palette = 'Pastel1',
fill = 'white',
colour = 'black',
size = 1,
title = name_of_experiment)
print(input_eluate_graph)
Now that we’ve compared our control, let’s remove all results that do not show up in the eluate files.
filtered_input_eluate_table <- input_eluate_table[input_eluate_table$eluate > 0,]
head(filtered_input_eluate_table)
## sequence input
## Repeat02_UV_ArgC.ECSVTSDLDFPTQVIPLK ECSVTSDLDFPTQVIPLK 1101400000
## Repeat02_UV_ArgC.WLGDLILSK WLGDLILSK 1180000000
## Repeat02_UV_LysC.ECSVTSDLDFPTQVIPLK ECSVTSDLDFPTQVIPLK 1649500000
## Repeat02_UV_LysC.LQLLDGEYEVAMQEMEECPISK LQLLDGEYEVAMQEMEECPISK 1057200000
## Repeat02_UV_LysC.WLGDLILSK WLGDLILSK 1288100000
## eluate category
## Repeat02_UV_ArgC.ECSVTSDLDFPTQVIPLK 13380000 Repeat02_UV_ArgC
## Repeat02_UV_ArgC.WLGDLILSK 14292000 Repeat02_UV_ArgC
## Repeat02_UV_LysC.ECSVTSDLDFPTQVIPLK 16849000 Repeat02_UV_LysC
## Repeat02_UV_LysC.LQLLDGEYEVAMQEMEECPISK 8763700 Repeat02_UV_LysC
## Repeat02_UV_LysC.WLGDLILSK 22946000 Repeat02_UV_LysC
## protein aa_before aa_after
## Repeat02_UV_ArgC.ECSVTSDLDFPTQVIPLK EZH2_Q15910-2 R K
## Repeat02_UV_ArgC.WLGDLILSK EED_O75530 R K
## Repeat02_UV_LysC.ECSVTSDLDFPTQVIPLK EZH2_Q15910-2 R K
## Repeat02_UV_LysC.LQLLDGEYEVAMQEMEECPISK SUZ12_Q15022 R K
## Repeat02_UV_LysC.WLGDLILSK EED_O75530 R K
We’ll use these results for finding binding sequences. We’ll first get a data.frame of the binding sequences aligned to the FASTA file.
bs_output_fasta <- rbd.getBSfromIET(filtered_input_eluate_table,
fasta_file = fasta_file,
align_to = 'FASTA')
head(bs_output_fasta[,1:6])
## binding_site_start binding_site_end
## 1 104 165
## 2 327 359
## 3 66 85
## 4 310 312
## 5 298 317
## binding_sequence
## 1 TLNAVASVPIMYSWSPLQQNFMVEDETVLHNIPYMGDEVLDQDGTFIEELIKNYDGKVHGDR
## 2 SCENAIVCWKPGKMEDDIDKIKPSESNVTILGR
## 3 QRRIQPVHILTSVSSLRGTR
## 4 NRR
## 5 IHFPDFSTRDIHRNYVDCVR
## ms2_peptide_seq ms2_start ms2_end
## 1 ECSVTSDLDFPTQVIPLK 86 103
## 2 WLGDLILSK 318 326
## 3 ECSVTSDLDFPTQVIPLK 86 103
## 4 LQLLDGEYEVAMQEMEECPISK 313 334
## 5 WLGDLILSK 318 326
Let’s make a heatmap and sort the sequences by their names.
bs_freqVector <- rbd.freqVector(bs_output = bs_output_fasta,
name_by = 'pro_name',
heatmap = TRUE,
db_selection = 'FASTA')
## Saving 7 x 5 in image
We can also adjust the colors of the heatmap if we so wish. You can put in a standard RColorBrewer or viridis palette or a list of hexcodes and/or standard R Colors.
bs_freqVector <- rbd.freqVector(bs_output = bs_output_fasta,
name_by = 'pro_name',
heatmap = TRUE,
db_selection = 'FASTA',
colors = "Blues",
save_plot = FALSE)
## Warning in brewer.pal(num, colors): minimal value for n is 3, returning requested palette with 3 different levels
#do a grid of different color options?
bs_freqVector <- rbd.freqVector(bs_output = bs_output_fasta,
name_by = 'pro_name',
heatmap = TRUE,
db_selection = 'FASTA',
colors = c('#b3ecec','#43e8d8','#48d1cc'),
save_plot = FALSE)
bs_freqVector <- rbd.freqVector(bs_output = bs_output_fasta,
name_by = 'pro_name',
heatmap = TRUE,
db_selection = 'FASTA',
colors = c("#465ca8","#7767b8","#a272c4",
"#cc7ecc",'#f38bd1'), #palette from color-hex.com
save_plot = FALSE)
If you require further customization, we recommend using the freqVector directly within a heatmap and customizing from there.
We will now align the binding sequences to a PDB file. For simplicity, we’ll only focus on one protein: EZH2.
ezh2_iet <- filtered_input_eluate_table[filtered_input_eluate_table$protein == 'EZH2_Q15910-2',]
We already know the PDB file and chain that we want to align the proteins to. In this case, we can create and add a new table:
protein_dict <- read.csv(system.file("extdata/NSMB_misc",'PRC2_protein_dict.csv',package = 'crisscrosslinker',mustWork = TRUE))
head(protein_dict)
## protein_id uniprot_id pdb_id
## 1 EZH2_Q15910-2 Q15910-2 6C23_C
## 2 EED_O75530 O75530 6C23_L
## 3 RBBP4_Q09028 Q09028 6C23_N
## 4 SUZ12_Q15022 Q15022 6C23_A
## 5 AEBP2_Q6ZN18 Q6ZN18-2 6C23_P
Now we don’t have to go through menus to be able to define the PDB ID/chain that we need to use. This will save us time when making our table of binding sequences.
You can always leave the dictionary blank in order to access an interactive menu to be able to use BLAST in order to align your sequence to a known PDB structure.
bs_ezh2 <- rbd.getBSfromIET(ezh2_iet,
fasta_file = fasta_file,
align_to = 'PDB',
protein_dict = protein_dict)
## HEADER GENE REGULATION 05-JAN-18 6C23
## HEADER GENE REGULATION 05-JAN-18 6C23
Let’s see how many matches we had to the chain we chose:
bs_ezh2$db
## [1] "PDB" "PDB"
2 matched our chain! Let’s visualize it.
Now that we have the binding sequences aligned to a PDB file, we can create a PyMOL file.
rbd.pymol(bs_ezh2, color_by = 'binding_sequence', colors = NULL)
## No color chosen --> reverting to PyMol Tints palette
## $rect
## $rect$w
## [1] 1.105
##
## $rect$h
## [1] 0.2187342
##
## $rect$left
## [1] 0.568
##
## $rect$top
## [1] 1.432
##
##
## $text
## $text$x
## [1] 0.631 0.631
##
## $text$y
## [1] 1.322633 1.267949
##
##
## No var with value 0 detected --> coloring gray
There is an additional legend with the colors corresponding to the binding sequences. The colors will be assigned based on the order the binding sequences are in. If you would like the sequences in a different order, please order them before executing rbd.pymol().
If we do not choose any colors, it will default to the PyMOL tints palette. However, we can input any RColorBrewer or viridis palette as well as standard PyMOL/R colors or hexcodes.
par(mfrow=c(2,2))
rbd.pymol(bs_ezh2, color_by = 'binding_sequence', colors = 'Dark2', file.name = 'pymol_Dark2.pml') #RColorBrewer palette
## Warning in brewer.pal(num, colors): minimal value for n is 3, returning requested palette with 3 different levels
## $rect
## $rect$w
## [1] 2.138085
##
## $rect$h
## [1] 0.5897368
##
## $rect$left
## [1] 0.568
##
## $rect$top
## [1] 1.432
##
##
## $text
## $text$x
## [1] 0.6899 0.6899
##
## $text$y
## [1] 1.1371316 0.9896974
## Warning in Ops.factor(vars, 1): '>' not meaningful for factors
## No var with value 0 detected --> coloring gray
rbd.pymol(bs_ezh2, color_by = 'binding_sequence', colors = 'magma', file.name = 'pymol_magma.pml') #viridis palette
## $rect
## $rect$w
## [1] 2.138085
##
## $rect$h
## [1] 0.5897368
##
## $rect$left
## [1] 0.568
##
## $rect$top
## [1] 1.432
##
##
## $text
## $text$x
## [1] 0.6899 0.6899
##
## $text$y
## [1] 1.1371316 0.9896974
## Warning in Ops.factor(vars, 1): '>' not meaningful for factors
## No var with value 0 detected --> coloring gray
rbd.pymol(bs_ezh2, color_by = 'binding_sequence', colors = c('red','blue'), file.name = 'pymol_rgb.pml') #standard colors
## Warning in if ((typeof(colors) == "character") & (startsWith(colors, "#")))
## {: the condition has length > 1 and only the first element will be used
## $rect
## $rect$w
## [1] 2.138085
##
## $rect$h
## [1] 0.5897368
##
## $rect$left
## [1] 0.568
##
## $rect$top
## [1] 1.432
##
##
## $text
## $text$x
## [1] 0.6899 0.6899
##
## $text$y
## [1] 1.1371316 0.9896974
##
##
## No var with value 0 detected --> coloring gray
rbd.pymol(bs_ezh2, color_by = 'binding_sequence', colors = c("#f8a1be","#ffb6b1"), file.name = 'pymol_hexcodes.pml') #hexcodes
## Warning in if ((typeof(colors) == "character") & (startsWith(colors, "#")))
## {: the condition has length > 1 and only the first element will be used
## $rect
## $rect$w
## [1] 2.138085
##
## $rect$h
## [1] 0.5897368
##
## $rect$left
## [1] 0.568
##
## $rect$top
## [1] 1.432
##
##
## $text
## $text$x
## [1] 0.6899 0.6899
##
## $text$y
## [1] 1.1371316 0.9896974
## Warning in Ops.factor(vars, 1): '>' not meaningful for factors
## No var with value 0 detected --> coloring gray
We have now looked at one experiment, but now we want to look at multiple experiments we have run and compare between them. First we will do what we have done before, but combine the different runs together to examine how often these binding sites appear.
Each of these experiments have slighly different prefixes, so we will adjust them accordingly with a list. In this instance, we could probably get away with just using “UV” for all 3 repeats. However, for the purposes of this tutorial, we will explicitly define the prefixes.
We’re now ready to proceed to get the binding sequences for each of these repeats.
bs_output_diff_analysis <- data.frame()
num_repeats <- 1:4
repeat_names <- paste0('Repeat0',num_repeats)
for(rname in repeat_names){
shl <- sequence_hit_list[startsWith(names(sequence_hit_list),rname)]
enames <- paste0(rname,'_',c('UV','NoXL'))
input_eluate_table <- rbd.makeIETable(shl, enames)
filtered_input_eluate_table <- input_eluate_table[input_eluate_table$eluate > 0,]
bs_output_pdb <- rbd.getBSfromIET(filtered_input_eluate_table,
fasta_file = fasta_file,
align_to = 'PDB',
protein_dict = protein_dict)
bs_output_diff_analysis <- rbind(bs_output_diff_analysis,data.frame(bs_output_pdb))
}
Now that we have loaded the data into our R environment, let’s make a PyMOL file:
rbd.pymol(bs_output_diff_analysis, color_by = 'freq', colors = 'magma', file.name = 'pymol_magma_diff_analysis.pml')
## Saving 7 x 5 in image
## HEADER GENE REGULATION 05-JAN-18 6C23
## HEADER GENE REGULATION 05-JAN-18 6C23
## HEADER GENE REGULATION 05-JAN-18 6C23
## HEADER GENE REGULATION 05-JAN-18 6C23
It will also create a heatmap with the same color scheme as the PyMOL file if heatmap is kept as TRUE. A legend will also be generated to go with the PyMOL file, though it will be similar to the one generated by the heatmap.